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Abstract 

We examine how the prior probability distribution of a sensory variable in the environment influences 
the optimal allocation of neurons and spikes in a population that represents that variable. We start with 
a conventional response model, in which the spikes of each neuron are drawn from a Poisson distribution 
with a mean rate governed by an associated tuning curve. For this model, we approximate the Fisher 
information in terms of the density and amplitude of the tuning curves, under the assumption that 
tuning width varies inversely with cell density. We consider a family of objective functions based on the 
expected value, over the sensory prior, of a functional of the Fisher information. This family includes 
lower bounds on mutual information and perceptual discriminability as special cases. For all cases, 
we obtain a closed form expression for the optimum, in which the density and gain of the cells in the 
population are power law functions of the stimulus prior. Thus, the allocation of these resources is 
uniquely specified by the prior. Since perceptual discriminability may be expressed directly in terms of 
the Fisher information, it too will be a power law function of the prior. We show that these results hold 
for tuning curves of arbitrary shape and correlated neuronal variability. This framework thus provides 
direct and experimentally testable predictions regarding the relationship between sensory priors, tuning 
properties of neural representations, and perceptual discriminability. 

1 Introduction 

Many bottom up theories of neural encoding posit that sensory systems are optimized to represent signals 
that occur in the natural environment of an organism [Ud]. A precise specification of the optimality of a 
sensory representation requires four components: (1) the family of neural transformations (that dictate how 
natural signals are encoded in neural activity), over which the optimum is to be taken; (2) the types of 
signals that are to be encoded, and their prior distribution in the natural environment; (3) the noise present 
in the input signals, and the additional noise that is introduced by the neural transformations; and (4) the 
costs (e.g., metabolic) of building, operating, and maintaining the system [3J. Although an optimal solution 
can be computed for some simple choices of these components (e.g., Linear response models and Gaussian 
signal and noise distributions [U[S]), the general problem is intractable. 

A substantial literature has considered simple population coding models in which each neuron's mean re- 
sponse to a scalar variable is characterized by a tuning curve [e.g.. IMT4]. For these models, several papers 
have examined the optimization of Fisher information, which expresses a bound on the mean squared error 
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of an unbiased estimator [TSHTg] . In these results, the distribution of sensory variables was assumed to be 
uniform, and the populations were assumed to be homogeneous with regard to tuning curve shape, spacing, 
and amplitude. 

The distribution of sensory variables encountered in the environment is often non-uniform, and it is thus of 
interest to understand how these variations in probability affect the design of optimal populations. It would 
seem natural that a neural system should devote more resources to regions of sensory space that occur with 
higher probability, analogous to results in coding theory [19] . At the single neuron level, several publications 
describe solutions in which monotonic neural response functions allocate greater dynamic range to higher 
probability stimuli [20H23] . At the population level, non- uniform allocations of neurons with identical tuning 
curves have been shown to be optimal for non- uniform stimulus distributions |24[|25) . 

Here, we examine the influence of a sensory prior on the optimal allocation of neurons and spikes in a popu- 
lation, and the implications of this optimal allocation for subsequent perception. Given a prior distribution 
over a scalar stimulus parameter, and a resource budget of TV neurons with an average of R spikes/sec 
for the entire population, we seek the optimal shapes, positions, and amplitudes of the tuning curves. We 
assume a population with Poisson-like spiking (which may include correlations), and consider a family of 
objective functions based on Fisher information. This family includes lower bounds on mutual information, 
and the minimum attainable perceptual discrimination performance as special cases. We then approximate 
the Fisher information in terms of two continuous resource variables, the density and gain of the tuning 
curves. This approximation allows us to obtain a closed form solution for the optimal population. For all 
objective functions, we find that the optimal tuning curve properties (cell density, tuning width, and gain) 
are power-law functions of the stimulus prior, with exponents dependent on the specific choice of objective 
function. Through the Fisher information, we also derive a bound on perceptual discriminability, again in 
the form a power-law of the stimulus prior. Thus, our framework provides direct and experimentally testable 
links between sensory priors, tuning properties of optimal neural representations, and perceptual discrim- 
inability. This work was initially presented in [261127] . and portions appear in the doctoral dissertation of 
the first author [2"5j . 



2 Encoding model and resource constraints 

We begin with a conventional model for a population of N neurons responding to a single scalar variable, s 
[e.g.. I6HT4]. We assume initially that the number of spikes emitted (per unit time) by the nth neuron is a 
sample from an independent Poisson process, with mean rate determined by its tuning function, h n (s). The 
probability density of the population response can be written as 

pew - n — • (i) 

n— 1 

For now, we also assume that the tuning functions can be described by unimodal functions of arbitrary 
shape. We generalize this analysis to the case of monotonic (saturating) tuning curves of arbitrary shape in 
section 15.11 And in section 15. 2\ we consider more general response models that can include non-Poisson and 
correlated spiking. 

We assume the total expected spike rate, R, of the population is fixed, which places a constraint on the 
tuning curves: 

N 

p(s) hn(s) ds = R, (2) 

•* 71=1 

where p(s) is the probability distribution of stimuli in the environment, and can have an arbitrary form. 
We refer to this as a sensory prior, in anticipation of its future use in Bayesian decoding of the population 
response. 
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3 Objective function 



We now ask: what is the best way to represent values drawn from p(s) given the limited resources of N 
neurons and R total spikes? To formulate a family of objective functions which depend on both p(s), and 
the tuning curves, we first rely on Fisher information, If(s), which is defined as |29j 

d 2 

r 

The Fisher information provides a measure of how accurately the population response represents the stimulus 
parameter, based on the encoding model. It has been used to answer theoretical questions about the 
influence of tuning curve shapes 15,16,30] and response variability [3TII32] on the representational accuracy 
of population codes. It has also been used in neurophysiological studies to quantify changes in coding 
accuracy resulting from changes in tuning curve shapes during adaptation [3"3Tl35| . For the independent 
Poisson noise model, the Fisher information can be expressed analytically as [5] 

T r„\ V- h n(s) 

//(S) = ^W 

where h' n (s) is the derivative of the n th tuning curve. 

The Fisher information can also be used to express lower bounds on mutual information |24) , the variance of 
an unbiased estimator [25] , and perceptual discriminability [3BJ. Specifically, the mutual information, I(f; s), 
is bounded by: 

m >) > H(s) -\! p(s) log da, (3) 



2 J ^' °\If(s) 

where H(s) is the entropy, or amount of information inherent in p(s), which is independent of the neural 
population. The bound is tight in the limit of low noise, which can occur as N increases, R increases, or 
both HQ. 

The Cramer-Rao inequality allows us to express the minimum expected squared stimulus discriminability 
achievable by any decoder: 

The constant A determines the performance level at threshold in a discrimination task. The conventional 
Cramer-Rao bound expresses the minimum mean squared error of any estimator, and in general requires a 
correction for the estimator bias j29| . Here, we use it to bound the squared discriminability of the estimator, 
as expressed in the stimulus space. This has the advantage that it is independent of bias [31] , and that it is 
easily (and commonly) measured in perceptual experiments. 

We formulate a generalized objective function that includes the Fisher bounds on information and discrim- 
inability as special cases: 



^ s )ME7^) ds ' s - L J P(s)EMs) ds = R, (5) 



arg max 

where /(•) is either the logarithm, or a power function. When f(x) = log(x), optimizing Eq. ([5]) is equivalent 
to maximizing the lower bound on mutual information given in Eq. ([3]). We refer to this as the infomax 
objective function. Otherwise, we assume f(x) — x a , for some exponent a. Optimizing Eq. ([5]) with a = — 1 
is equivalent to minimizing the squared discriminability bound expressed in Eq. Q. We refer to this as the 
discrimax objective function. 
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4 How to optimize? 



The objective function expressed in Eq. (|5|) is difficult to optimize because it is non-convex. To facilitate the 
optimization, we first parameterize a heterogeneous neural population by warping and rescaling a homoge- 
neous population, as specified by a cell density function, d(s), and a gain function, g(s). In the resulting 
warped population, the tuning widths are inversely proportional to the cell density. Second, we show that 
Fisher information can be closely approximated by a continuous function of density and gain. Finally, re- 
writing the objective function and constraints in these terms allows us to obtain closed-form solutions for 
the optimal tuning curves. 



4.1 Density and gain for a homogeneous population 

If p(s) is uniform, then by symmetry, the Fisher information for an optimal neural population should also 
be uniform. We assume a convolutional population of unimodal tuning curves, evenly spaced on the unit 
lattice, such that they approximately "tile" the space: 

N 

£>(*-n)«l. 

71=1 

We also assume that this population has an approximately constant Fisher information: 



N 

*/(•) = E 



h' 2 (s 



his — n) 

71=1 ^ ' 

N 

= ^ H s - n ) ~ 4onv (6) 

71=1 

That is, we assume that the Fisher information curves for the individual neurons, <p(s — n), also tile the 
stimulus space. The value of the constant, iconv, is dependent on the details of the tuning curve shape, 
h(s), which we leave unspecified. As an example, Fig. QJa-b) shows (through numerical simulation) that 
the Fisher information for a convolutional population of Gaussian tuning curves, with appropriate width, 
is approximately constant. Now we introduce two variables, a gain (g), and a density (d), that affect the 
convolutional population as follows: 

h n (s)=gh(d{s-- d )). (7) 

The gain modulates the maximum average firing rate of each neuron in the population. The density controls 
both the spacing and width of the tuning curves: as the density increases, the tuning curves become narrower, 
and are spaced closer together so as to maintain their tiling of stimulus space. The effect of these two 
parameters on Fisher information is: 

N(d) 

I f( s ) = d 2 g ^2 <P{ds - n) 

n=l 
~ d g Iconv • 

The second line follows from the assumption of Eq. ([6]), that the Fisher information of the convolutional 
population is approximately constant with respect to s. 

The total resources, N and R, naturally constrain d and g, respectively. If the original (unit-spacing) 
convolutional population is supported on the interval (0, Q) of the stimulus space, then the number of neurons 
in the modulated population must be N(d) — Qd to cover the same interval. Under the assumption that the 
tuning curves tile the stimulus space, Eq. ([2]) implies that R — g for the modulated population. 
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Figure 1: Construction of a heterogeneous population of neurons, (a) Homogeneous popula- 
tion with Gaussian tuning curves on the unit lattice. The tuning width of a — 0.55 is chosen 
so that the curves approximately tile the stimulus space, (b) The Fisher information of the 
convolutional population (green) is approximately constant, (c) Inset shows g?(s), the tuning 
curve density. The cumulative integral of this density, D(s), alters the positions and widths 
of the tuning curves in the convolutional population, (d) The warped population, with tun- 
ing curve peaks (aligned with tick marks, at locations s n = I? _1 (n)), is scaled by the gain 
function, g(s) (blue). A single tuning curve is highlighted (red) to illustrate the effect of the 
warping and scaling operations, (e) The Fisher information of the inhomogeneous population 
is approximately proportional to d 2 (s)g(s). 

4.2 Density and gain for a heterogeneous population 

Intuitively, if p(s) is non-uniform, the optimal Fisher information should also be non-uniform. But note that 
this could potentially be achieved through inhomogeneities in either the tuning curve density or gain, and it 
is not obvious a priori what combination of these two functions would yield the best solution. 

To solve for an optimal heterogeneous population, we generalize density and gain to be continuous functions 
of the stimulus, d(s) and g(s), that warp and scale the convolutional population: 

h n {s)=g(s n ) h{D(s)-n). (8) 

Here, D(s) — J** d(t)dt, the cumulative integral of d(s), warps the shape of the prototype tuning curve. The 
value s n = Z3 _1 (n) represents the preferred stimulus value of the (warped) nth tuning curve (Fig. [TJb-d)). 
Note that the warped population retains the tiling properties of the original convolutional population. As 
in the uniform case, the density controls both the spacing and width of the tuning curves. This can be seen 
by rewriting Eq. (|8|) as a first-order Taylor expansion of D(s) around s n : 

hn(s) « g(s n ) h(d(s n )(s - S n )), 
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Infomax 


Discrimax 


General 


Optimized function: 


f(x) = logx 


f{x) = -x-' 


f{x) = -x a , a < i 


Density (Tuning width) _1 d(s) 
Gain g(s) 
Fisher information Ifi s ) 
Discriminability bound cWn(s) 


Np(s) 
R 

cx RN 2 p 2 (s) 
cx p^ 1 (s) 


cx Npz (s) 
cx Rp~^(s) 
cx RN 2 p?(s) 

CX p~i (s) 


cx ATp3s=r(s) 

CX i?pT^ (s) 

cx RN 2 pT^ (s) 
cx p3^^(s) 



Table 1: Optimal heterogeneous population properties, for objective functions specified by 
Eq. (HQ). 



which is a generalization of Eq. ([7]) . 

We can now write the Fisher information of the heterogeneous population of neurons in Eq. (j8|) as 

N 

W = E d2 ( s )5(^)<KW-n) (9) 

n=l 

~d 2 {s) g(s) I conv . (10) 

In addition to assuming that the Fisher information is approximately constant (Eq. ©), we have also 
assumed that gi-s) is smooth relative to the width of 0(_D(s) — n) for all n, so that we can approximate gis n ) 
as g{s) and remove it from the sum. The end result is an approximation of Fisher information in terms of 
the continuous parameterization of cell density and gain. As earlier, the constant / con v is determined by the 
precise shape of the tuning curves. 

As in the homogeneous case, the global resource values N and R will place constraints on d(s) and g(s), 
respectively. In particular, we require that D(-) map the entire input space onto the range [1, N]. Thus, for 
an input space covering the real line, we require D(—oo) — 1 and D(oo) = N (or equivalently, J d(s) ds = N). 
To attain the proper rate, we use the fact that the warped tuning curves sum to unity (before multiplication 
by the gain function), along with Eq. ([2]), to obtain the constraint J p(s)g(s) ds = R. 



4.3 Objective function and solution for a heterogeneous population 

Approximating Fisher information as proportional to squared density and gain allows us to re-write the 
objective function and resource constraints of Eq. ([5]) as 

argmax / p(s) /(o? 2 (s) g(s)) ds, s.t. / d(s) ds — N, and / p(s)g(s) ds = R. (11) 

d(s),s(s) J J J 

A closed-form optimum of this objective function is easily determined using calculus of variations. Specif- 
ically, one can compute the gradient of the Lagrangian, set to zero, and solve the resulting system of 
equations (see Appendix [AJ . Solutions are provided in Table [1] for the infomax, discrimax, and the general 
power cases. 

In all cases, the solution specifies a power-law relationship between the prior, and the density and gain of 
the tuning curves. In general, all solutions allocate more neurons, with correspondingly narrower tuning 
curves, to higher-probability stimuli. In particular, the infomax solution corresponds to a population with 
constant gain, and allocates an approximately equal amount of probability mass to each neuron, as one 
might intuitively expect from coding theory (Fig. EJa-b)). The shape of the optimal gain function depends 
on the objective function: for a < 0, neurons with lower firing rates are used to represent stimuli with higher 
probabilities, and for a > 0, neurons with higher firing rates are used for stimuli with higher probabilities. 
Note also that the global resource values, N and R, enter only as scale factors on the overall solution. As a 
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Figure 2: Infomax predictions of the relationship between environment, physiology, and per- 
ceptual discriminability. (a) An example of a probability distribution over a sensory attribute, 
s, which can be directly measured from the environment, (b) Tuning curves of a neural popula- 
tion designed to maximize the amount of information transmitted about stimuli drawn from the 
prior distribution in Panel a. (c-e) Experimentally accessible attributes of the infomax popu- 
lation in Panel b, and their corresponding predictions in terms of the prior distribution (black), 
as expressed in Table 1. (c) A histogram of the preferred stimuli (stimuli associated with the 
peaks of the tuning curves) of the neurons is an estimate of local cell density, which should 
be proportional to the prior distribution, (d) The tuning widths of the neurons (measured as 
the full width at half maximum of the tuning curves) should be inversely proportional to the 
prior distribution, (e) The gain, measured as the maximum average firing rate of each of the 
neurons, should be constant, (f ) Minimum achievable discrimination thresholds of a perceptual 
system operating on the responses of an infomax population should be inversely proportional 
to the prior distribution. 



result, if one or both of these factors are unknown, the solution still provides a unique specification of the 
shapes of d(s) and g(s), which can be tested against experimental data (Fig. [2j c-e)). 

In addition to power-law relationships between tuning properties and sensory priors, our formulation offers 
a direct relationship between the sensory prior and perceptual discriminability. This can be obtained by 
substituting the optimal solutions for d(s) and <?(s) into Eq. ([§]), and using the resulting Fisher information 
to bound the discriminability, S(s) > S m i n (s) = A/ yj If(s) [36]. The resulting expressions are provided in 
Table [TJ In general, the solutions predict that discrimination thresholds should be lower for more frequently 
occurring stimuli. In particular, the infomax solution predicts that inverse thresholds (discriminability) 
should be directly proportional to the prior (Fig. [Iff)). 



5 Extensions 

5.1 Monotonic tuning curves 

Thus far we have solved for the optimal cell density and gain for warping and scaling a homogeneous pop- 
ulation of unimodal tuning curves. However, many neurons exhibit monotonic tuning to intensity variables 
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such as contrast, or sound pressure level. The influence of continuous cell density and gain on the Fisher 
information of a homogeneous population of monotonic tuning curves is the same as in the unimodal case 
(Eq. (jlOp ). again assuming that the Fisher information curves of the homogeneous population tile. The con- 
straint on N is also same. However, the total spiking cost fundamentally differs. Neurons with monotonic 
tuning curves saturate, and thus the entire population will be active at the high range of stimulus values, 
which incurs a large metabolic cost for encoding these values. Intuitively, this metabolic penalty can be 
reduced by lowering the gains of neurons tuned to the low end of the stimulus range, or by adjusting the 
cell density such that there are more tuning curves tuned to the high end of the stimulus range. It is not 
obvious how the reductions in metabolic cost for these coding strategies should trade off with the optimal 
coding of sensory information. 

To derive the optimal monotonic coding scheme, we first parameterize a heterogeneous population of mono- 
tonic tuning curves by warping and scaling the derivatives of a homogeneous population of monotonic tuning 
curves: 

K(s)= f ti n (t)dt= f g(s n )d(t)h'(D(t)-n)dt. (12) 

J — oo J —oo 

This expression is similar to the parameterization of a heterogeneous population of unimodal tuning curves 
(Eq. ©), except here, h{-) is now a prototype monotonic tuning curve. The density controls both the number 
of tuning curves and their slopes, which are inversely proportional to the cell density. The derivatives of 
the (warped) monotonic tuning curves, h'(D(t) — n), will be unimodal functions, allowing us to use similar 
approximations and intuitions developed for the unimodal case. In particular, we assume that the derivatives 
of the tuning curves tile such that Yl n =i h'(D(t) — n) rs 1. 

The total spike count can be expressed from Eqs. (|2"1 fe [T2")) as, 

P(s) / d(t)^2g(s n )h'(D(t) -n)dtds. 

-oo J —oo __i 



We define a continuous version of the gain as g(t) = d( s n)h' (D(t) — n) which allows us to approximate 

the total number of spikes as 

/•OO />S 

R = / p(s) / d(t)g(t) 



(1 - P(s))d(s)g(s)ds 

In the second step, we performed integration by parts and defined P(s) — p(t)dt as the cumulative 
density function of the sensory prior. The constraint on the total number of spikes is very different than the 
bell-shaped tuning curve case, as it now depends on the cell density and the cumulative distribution of the 
sensory prior, and will thus affect the optimal solutions for cell density and gain. 

We reformulate the original optimization problem of Eq. ([5]) for monotonic tuning curves as: 



argmax / p(s) f (d 2 (s) g(s)) ds, s.t. / d(s) ds — N, (13) 

d(s) : g(s)J J 

and J (l-P(s))d(s)g(s)ds = R. 

A closed-form optimum of this objective function is easily determined by taking the gradient of the La- 
grangian, setting to zero, and solving the resulting system of equations. Solutions are provided in Table. [5] 
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Infomax 



Discrimax 



General 



Optimized: 



f(x) = logx 



/(x) = -x", 



Fisher. 



Discrim. 



Density 
Gain 



(s) 



d(s) 



Np(s) 

RN- 1 [1 -P(s)] -1 

oc RNp 2 {s) [1 - P(s)] -1 

ocp-^s) [1 - P(g)]j 




oc Np(s)T^ [1 - P(s)]^ 1 
RN- 1 [1 - P(s)] -1 
cx RNp^(s) [1 - P(s)] 5sJrT 
cx p~^ (s) [1 - P(s)] 



Table 2: Optimal heterogeneous population properties, for objective functions specified by 
Eq. JSJ. 



for the infomax, discrimax, and general power cases, in addition to solutions for the optimal Fisher informa- 
tion and minimum achievable discrimination thresholds achievable by a subsequent perceptual system. 

For all objective functions, the solutions for the optimal density, gain, and discriminability are products of 
power law functions of the sensory prior, and its cumulative distribution. In general, all solutions allocate 
more neurons with greater dynamic range to more frequently occurring stimuli. Unlike the solutions for 
unimodal tuning curves (Table Q}, the optimal gain is the same for all objective functions: for a neuron 
tuned to a particular stimulus value, the optimal gain is inversely proportional to the probability of all 
stimuli occurring after that stimulus value. Intuitively, this solution allocates lower gains to neurons tuned 
to the low end of the stimulus range, which is metabolically less costly. The global resource values N and 
R again only appear as scale factors in the overall solution, allowing us to easily compare the predicted 
relationships to experimental data, even when TV and R are not known. 

5.2 Generalization to Poisson-like noise distributions 

Our results depend on the assumption that neuronal variability is Poisson distributed and neural responses 
are statistically independent. In a Poisson model, the variance of the neural responses is directly proportional 
to the mean responses, which has been observed experimentally in some cases [37], but may not be true in 
general. In addition, the assumption that neuronal responses are statistically independent conditioned on 
the stimulus value is often violated |38U39) . 

Here, we generalize the results to a family of "Poisson-like" response models [HMO], that allow for stimulus 
dependent correlations and an arbitrary linear relationship between mean and variance of the population 
response. We assume the probability density of the population response can be written as 



This distribution belongs to the exponential family with linear sufficient statistics where the parameter tj(s) 
is a matrix of the natural parameters of the distribution with the n th column equal to rj n (s), A(rj) is a (log) 
normalizing constant that ensures the distribution integrates to one, and f{r) is an arbitrary function of 
the firing rates. The independent Poisson noise model considered in Eq. (JlJ is a member of this family of 
distributions with parameters: rj(s) = logh(s) where h(s) is a matrix of tuning curves with the n th column 
given h n {s), A{rj) = E^expfe), and f(r) = U^ =1 ^. 

All of our objective functions depend on an analytical form for the Fisher information in terms of tuning 
curves, which is then expressed in terms of density and gain. To derive the Fisher information for the 
response model in Eq. ([Mil , we start by noting that the derivative of natural parameters is related to the 
stimulus dependent covariance matrix of the population responses, E(s), and the derivative of the tuning 
curves as [T^IFIU] . 



P(r\ s ) = /(f) exp [r){s) T r - A(rj)] . 



(14) 




(15) 
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The term S _1 (s) is the inverse of the covariance matrix, and is often referred to as a precision matrix. 

The Fisher information matrix about the natural parameters is simply equal to the covariance matrix 

I f [ V (s)] = E(s). (16) 
The local Fisher information about the stimulus, s, can be derived from the chain rule as, 

After substituting the relationships in Eq. (IT51 & [T6|) into this expression we obtain the final expression for 
the local Fisher information 

W = ^ S ^ (17) 

The influence of Fisher information on coding accuracy is now directly dependent on knowledge of stimulus 
dependent (inverse) covariance matrix. Estimating such a precision matrix from experimental data is tech- 
nically challenging (although see [39]). Here, we assume a biologically plausible precision matrix that allows 
for neuronal variability to be proportional to the mean firing rate, and the responses of nearby neurons to 
be correlated [31] . For a homogeneous neural population, h n (s) = h(s — n), we express each element in the 
precision matrix as, 

v -l / x _ a ^n,m + P{S n ,m+l + S n +l,m) n „s 

^n,m\ S ) ~ 7=7 , ? ■ K 1S ) 

y h{s — n)h{s — m) 

The parameter a controls a linear relationship between the mean response and the variance of the response 
for all the neurons. The parameter (3 controls the degree of the correlations, and d n>n — 1 for all n while 
5 n .m = if n ^ m. The Fisher information of a homogeneous population may now be expressed from 
Eqs. !jT?1 &[T8l) as, 

T . ^-y \ h' (s — n)h' (s — m) 



n=l n, m =n±l y/h(s - Tl)h(s - m) 

~ OzI conv -\- P^corr 

In the last step we make two assumptions. First, we assume (as for the independent Poisson case) the 
Fisher information curves, <j){s — n), of the homogeneous population tile such that they sum to the constant, 

J co]lv . Second, we assume that the cross terms, - / s n ^ h ^ m ^ , also tile such that they sum to the constant, 

W h(s— n)h(s — m) 

It 



corr • 



The Fisher information for a heterogeneous population, obtained by warping and scaling the homogeneous 
population by the density and gain is 

I f (s) = d\s)a g(s n MD(s) ~ n) (19) 

n=l 



g{s n )g{s m ) h'{D(s) - n)h'(D(s) - m) 
„.,„ , l± i \/g{s n )g(s m ) \/h(D(s) - n)h(D{s) - m) 



d 2 (s)g(s)[aI conv + f3I com ]. (20) 
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In the second step we make three assumptions. First, (as for the independent Poisson case) we assume g(s) 
is smooth relative to the width of <p{D{s) — n) for all n, so that we can approximate g(s n ) as g(s). Second, 
we assume that the neurons are sufficiently dense such that g(fgM£™2 ~ g(s n ). Finally, we assume g(s) is 

also smooth relative to the width of the cross terms. As a result, the gain factors can be approximated by 
same the continuous gain function, g(s), and can be pulled out of both sums. 

Given the form of the Fisher information (Eq. (|20p ). we conclude that the optimal solutions for the density 
and gain are the same as those expressed in Tables [2] & [TJ which were derived for an independent Poisson 
noise model (a = = 0). The values of the Fisher information, and minimum achievable discrimination 
thresholds now depend on three additional scale factors, a, /3, and /com that characterize the correlated 
variability of the population code. 

6 Discussion 

We have examined the influence of sensory priors on the optimal allocation of neural resources, as well as 
the implications of this optimal allocation on subsequent perception. For a family of objective functions, we 
obtain closed-form solutions specifying power law relationships between the prior probability distribution 
of a variable in the environment, the tuning properties of a population that encodes that variable, and 
the minimum perceptual discrimination thresholds achievable for that variable. The predictions are easily 
testable, and preliminary evidence indicates that the infomax solution is consistent with physiological and 
perceptual data for several sensory attributes [37J[3H] ■ 

Our analysis requires several approximations and assumptions in order to arrive at an analytical solution. 
First, we rely on lower bounds on mutual information and discriminability, each based on Fisher information. 
Fisher information is known to provide a poor bound on mutual information when there are a small number 
of neurons, a short decoding time, or non-smooth tuning curves |24(I41|. It also provides a poor bound 
on supra-threshold discriminability [30,12]. It is worth noting, however, we do not require the bounds on 
either information or discriminability to be tight, but rather that their optima be close to that of their 
corresponding true objective functions. In addition, our preliminary evidence indicates that, at least for 
typical experimental settings, both physiological and perceptual data appear to be consistent with the 
infomax version of our theory [27l[28]. We also made several assumptions in deriving our results: (1) the 
tuning curves, h(D(s) — n), or in the monotonic case their derivatives, h'(D(s) — n), evenly tile the stimulus 
space; (2) the single neuron Fisher informations, 4>(D(s) — n), evenly tile the stimulus space; and (3) the 
gain function, g(s), varies slowly and smoothly over the width of <p{D(s) — n). These assumptions allow us 
to approximate Fisher information in terms of cell density and gain (Fig. Q3e)), to express the resource 
constraints in simple form, and to obtain a closed-form solution to the optimization problem. 

Our framework offers an important generalization of the population coding literature, allowing for non- 
uniformity of sensory priors, and corresponding heterogeneity in tuning and gain properties. Nevertheless, it 
suffers from the main simplification found throughout that literature: the tuning curve response model is re- 
stricted to a single (one-dimensional) stimulus attribute. Real sensory neurons exhibit selectivity for multiple 
attributes. If the environmental distribution (prior) for those attributes is separable (i.e., if the values of those 
attributes are statistically independent) then an efficient code can be constructed separably. That is, each 
neuron could have joint tuning arising from the product of a tuning curve for each attribute. But extending 
the theory to handled multiple attributes with statistical dependencies is not straightforward. 

Our formulation assumes the sensory attribute of interest is drawn from a fixed and stable distribution. 
However, the distribution of sensory inputs can vary according to context, and it is of interest to consider 
how the theory might be generalized to adjust to such changes. A potential clue comes from the physiology: 
a large body of literature describes adaptive changes in neural gain at time scales ranging from milliseconds 
to hours. These have been interpreted as homeostatic mechanisms whose purpose is to maintain a high 
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level of information transmission [43H41)] . How does this fit with our predictions? A potential interpretation 
is that tuning curves, which presumably arise from the strength of synaptic connections, are established 
and adjusted over slow time scales, so as to efficiently capture the heterogeneities in stable environmental 
distributions, whereas the gains of individual neurons are adjusted more rapidly, so as to adapt to fluctuating 
heterogeneities in input intensity, local metabolic resources, or task-related requirements. 

Finally, the structure of our optimal population has direct implications for Bayesian decoding, a problem that 
has received much attention in recent literature [e.g., [13: 14,46 48J. A Bayesian decoder must have knowledge 
of prior probabilities, but an often-overlooked issue is how such knowledge is obtained and represented in 
the brain [47} . Our efficient coding solution provides a mechanism whereby the prior is implicitly encoded in 
the arrangement and gains of tuning curves. Recent publications [49H52] have proposed that a population- 
vector computation (i.e., the average of the preferred stimuli of the neurons, weighted by their corresponding 
responses), coupled with an inhomogeneous arrangement of tuning curves, could provide a simple means 
for the brain to approximate a Bayesian estimate. For the case of an infomax population, we have derived 
a decoder that more closely approximates a Bayesian least-squares estimator. Similar to the population 
vector, it computes a weighted average of the preferred stimuli, but the weights are constructed by linearly 
combining and then exponentiating the responses [28[ |53j. Thus, efficient population representations may 
offer unforeseen benefits for explaining subsequent stages of sensory processing. 

A Solution for the infomax objective function with bell-shaped 
tuning curves 



The optimum of the objective function in Eq. (1111) is easily determined using the method of Lagrange 
multipliers. As an example, consider the case when /(•) = log(-), which corresponds to optimization of the 
Fisher bound on mutual information between the input signal and the population response. The Lagrangian 
for this case is expressed as: 

L(d(s), g(s), A X! A 2 ) = J p(s) log (d 2 ( s )g(s)) ds + Ax (J d(s) ds - N^j + A 2 (J p{s)g{s) ds-lDj. 

The optimal cell density and gain that satisfy the resource constraints are determined by setting the varia- 
tional gradient of the Lagrangian to zero, and solving the resulting system of equations: 

2p(s)d- 1 (s) + \ 1 =Q (21) 
p( S )g' 1 ( S ) + X 2 p( S )^0 (22) 

og(s) 
dL f 

/ d(s)ds- N = (23) 



dd(s) 
dL 



p(s)g(s) -R = 0. (24) 



dXi 
dL 

The optimal cell density and gain are determined from Eqs. ([2"T1 &; |2"21 as: 

d(s) = -2Af 1 p(s) (25) 
g(s) = -lA^ 1 (26) 

The unknown Lagrange multipliers can be determined by substituting these expressions into Eqs. ([2"3"1 fe |2~4"[) 
and solving. The result is: 

Ai = -27V- 1 
A 2 = -Br 1 
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Substituting these expressions into Eqs. ([251 & |2"B1 yields the Infomax solution expressed in Table [TJ The 
same method was used to derive the rest of the solutions expressed in Tables Q] & [2j 
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